Maximum Split Clustering Under Connectivity Constraints

نویسندگان

  • Mark de Rooij
  • John C. Gower
چکیده

Consider N entities to be classified (e.g., geographical areas), a matrix of dissimilarity between pairs of entities, a graph H with vertices associated with these entities such that the edges join the vertices corresponding to contiguous entities. The split of a cluster is the smallest dissimilarity between an entity of this cluster and an entity outside of it. The single-linkage algorithm (ignoring contiguity between entities) provides partitions into M clusters for which the smallest split of the clusters, called split of the partition, is maximum. We study here the partitioning of the set of entities into M connected clusters for all M between N−1 and 2 (i.e., clusters such that the subgraphs of H induced by their corresponding sets of entities are connected) with maximum split subject to that condition. We first provide an exact algorithm with a Θ(N2) complexity for the particular case in which H is a tree. This algorithm suggests in turn a first heuristic algorithm for the general problem. Several variants of this heuristic are also explored. We then present an exact algorithm for the general case based on iterative determination of cocycles of subtrees and on the solution of auxiliary set covering problems. As solution of the latter problems is time-consuming for large instances, we provide another heuristic in which the auxiliary set covering problems are solved approximately. Computational results obtained with the exact and heuristic algorithms are presented on test problems from the literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exact and Approximation Algorithms for the Maximum Constraint Satisfaction Problem over the Point Algebra

We study the constraint satisfaction problem over the point algebra. In this problem, an instance consists of a set of variables and a set of binary constraints of forms (x < y), (x ≤ y), (x 6= y) or (x = y). Then, the objective is to assign integers to variables so as to satisfy as many constraints as possible. This problem contains many important problems such as Correlation Clustering, Maxim...

متن کامل

Dynamical reconnection and stability constraints on cortical network architecture.

Stability under dynamical changes to network connectivity is invoked alongside previous criteria to constrain brain network architecture. A new hierarchical network is introduced that satisfies all these constraints, unlike more commonly studied regular, random, and small-world networks. It is shown that hierarchical networks can simultaneously have high clustering, short path lengths, and low ...

متن کامل

Deep Transductive Semi-supervised Maximum Margin Clustering

Semi-supervised clustering is an very important topic in machine learning and computer vision. The key challenge of this problem is how to learn a metric, such that the instances sharing the same label are more likely close to each other on the embedded space. However, little attention has been paid to learn better representations when the data lie on non-linear manifold. Fortunately, deep lear...

متن کامل

Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement

In this paper we present a new multi-dimensional segmentation algorithm. We propose an orientation-adaptive boundary estimation process, embedded in a multiresolution pyramidal structure, that allows the use of different clustering procedures without spatial connectivity constraints. The presence of noise in the feature space, mainly produced by modeling errors, causes a class-overlap which can...

متن کامل

A novel local search method for microaggregation

In this paper, we propose an effective microaggregation algorithm to produce a more useful protected data for publishing. Microaggregation is mapped to a clustering problem with known minimum and maximum group size constraints. In this scheme, the goal is to cluster n records into groups of at least k and at most 2k_1 records, such that the sum of the within-group squ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Classification

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2003